perf: cache per-render arrays/LINQ in DataGrid#669
Conversation
The DataGrid rebuilt arrays and re-ran LINQ on every render even when the underlying sort/filter/column/width state was unchanged. On the data-grid stress workload (StressPerf.ReactorGrid, the grid being benchmarked) this dominated Renders/sec, reconcile, diff, and memory. DataGridState.cs: - Add SortVersion/FilterVersion/ColumnVersion counters bumped by every mutation path (ToggleSort, SetFilter/ClearFilter/ClearAllFilters, Resize/Hide/Show/Reorder/Pin), used as cache keys. - O(1) lookups via dictionaries kept in sync with the lists: GetSortDirection (_sortDirByField), GetFilter (_filterByField), GetColumnWidth/PinColumn (_columnIndexByName), replacing per-column LINQ FirstOrDefault/FindIndex. - Columns getter serves a cached _visibleColumns list (version-keyed) instead of Where+ToList on every access. - GetColumnLayout(...) caches the per-column widths + a single shared GridDefinition keyed on ColumnVersion + shape; the header and every data row reuse the one definition reference so the reconciler skips re-applying ColumnDefinitions. - SelectRangeByKeyCache scans the internal row-key string[] by index instead of materializing the whole cache into a List<RowKey> on shift-click. DataGridComponent.cs: - Memoize the DataRequest (also stabilizes UseDataSource deps, fixing a spurious per-render pagination restart) and the sort key (UseMemo). - Root grid + data/header rows build GridElements directly from cached GridDefinitions (static root-row-def cache; shared column layout) instead of allocating string[]/double[] + per-column double.ToString each render. - Cache the once-wired LostFocus setter array in a UseRef. The new perf-only state members are internal (InternalsVisibleTo Reactor.Tests), so the public API surface is unchanged. #126 (per-row Element?[] pooling) and #131B (validation-summary cache) are intentionally not done; see the issue for the correctness rationale. Adds DataGridPerfCacheTests covering cache invalidation, O(1)-lookup parity with the old LINQ, the shared layout cache's reference stability, the visible-columns cache, and SelectRangeByKeyCache parity. Closes #663 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-phase fixes for PR #669 (pr-review skill + multi-model cross-check): - GetColumnLayout: key the width/GridDefinition cache on the caller-supplied columns list (reference + count), not only ColumnVersion+shape. ColumnVersion only tracks internal mutations (resize/hide/show/reorder/pin); a swapped el.Columns/auto-columns list could otherwise be served a stale, wrong-sized layout (and index out of range in the row renderer). [HIGH, confirmed] - DataGridComponent: hoist both editable-grid UseRef hooks out of the if (el.Editable) branch so the hook call order is stable when Editable toggles between renders (avoids HookOrderException). [HIGH] - Columns getter: build a fresh visible-column snapshot on invalidation instead of returning the internal list or clearing the cached buffer in place, so a previously returned IReadOnlyList is never mutated by a later column change (restores the snapshot semantics of the old Where(...).ToList()). [LOW] Tests: add coverage for layout-cache invalidation on a columns-reference/count change, content invalidation on reorder/pin/toggle-visibility, the Columns snapshot guarantee, unknown-column lookup parity, and SelectRange reversed/ missing/empty parity. Full Reactor.Tests: 9688 passed / 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR targets DataGrid render-path allocation hotspots in src/Reactor/Controls/DataGrid/ by introducing version-keyed caches and O(1) lookups in DataGridState, and by memoizing / reusing stable layout objects in DataGridComponent. The intent is to reduce per-render LINQ/array churn without changing the public API surface.
Changes:
- Add sort/filter/column version counters and dictionary-based O(1) lookups in
DataGridState, plus cached visible-columns and cached column-layout (GridDefinition+ widths) keyed by versions. - Update
DataGridComponentto memoizeDataRequestand the legacysortKey, and to buildGridElements directly from cachedGridDefinitions (root grid + header/rows) to avoid per-render string/double arrays. - Add
DataGridPerfCacheTeststo validate cache invalidation, lookup parity with prior LINQ behavior, and range-selection parity.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/Reactor.Tests/DataGridPerfCacheTests.cs | Adds focused unit tests validating the new caching/versioning behavior and ensuring output parity with previous LINQ-based logic. |
| src/Reactor/Controls/DataGrid/DataGridState.cs | Introduces version counters, lookup dictionaries, visible-columns caching, column-layout caching, and an allocation-free range selection path over the row-key cache. |
| src/Reactor/Controls/DataGrid/DataGridComponent.cs | Memoizes request/sort key and reuses cached GridDefinitions by constructing GridElement directly, reducing per-render allocations and helping reconciler skip definition reapplication. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cache the search TextBox's onChanged handler in a UseRef so it keeps a stable reference across renders instead of allocating a fresh closure each time. It captures only the stable `state`, so a once-built handler is equivalent to rebuilding it every render. The ref is declared unconditionally (hooks must run in a stable order; ShowSearch can toggle), mirroring the #131A LostFocus-setter pattern. This is the only cell/toolbar-facing handler created in the component's hook context. Every other DataGrid handler (per-row click/expand/edit/select, header sort/resize/reorder/pin) is built inside the static, virtualized RenderRow / RenderHeaderRow methods, which run in VirtualList's renderItem factory with no hook context — UseCallback/UseRef cannot be called there, so they can't be stabilized without a separate state-backed delegate cache. On this branch the Element skip path (Element.CanSkipUpdate) compares callback *presence*, not delegate identity (ShallowEquals intentionally ignores delegate equality), so those fresh closures do not currently defeat the skip path; reference-stabilizing them only matters once the Core reference-comparison skip-path lands and is best done as a coordinated follow-up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 2.82 | 2.77 | -1.1% 95% CI [-6.1, +3.8] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 127.8 | 129.1 | +1.3% 95% CI [-2.2, +4.8] | ≈ within noise |
| Avg Diff (ms) ↓ | 116.4 | 118.2 | +1.2% 95% CI [-2.6, +5.0] | ≈ within noise |
| Avg Memory (MB) ↓ | 287.3 | 288.3 | -0.1% 95% CI [-0.8, +0.6] | ≈ within noise |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.62 | 16.58 | +0.9% 95% CI [-6.6, +8.5] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 34.1 | 36.1 | +4.7% 95% CI [-4.9, +14.4] | ≈ within noise |
| Avg Diff (ms) ↓ | 32.2 | 34.2 | +4.4% 95% CI [-5.9, +14.8] | ≈ within noise |
| Avg Memory (MB) ↓ | 268.2 | 267.5 | -0.1% 95% CI [-0.4, +0.2] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 5699901 | 5749942 | +0.2% 95% CI [-1.5, +1.9] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 207.14 | 222.53 | +1.8% 95% CI [-10.3, +13.9] | ≈ within noise |
Keyed-list workload (StressPerf.KeyedList, --percent 50)
A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 22.37 | 22.38 | -0.5% 95% CI [-2.6, +1.6] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 14.9 | 14.8 | +0.5% 95% CI [-1.4, +2.4] | ≈ within noise |
| Avg Diff (ms) ↓ | 14.7 | 14.6 | +0.5% 95% CI [-1.4, +2.3] | ≈ within noise |
| Avg Memory (MB) ↓ | 169.7 | 170.1 | -0.2% 95% CI [-0.9, +0.4] | ≈ within noise |
Allocation (keyed-list) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 313624 | 313312 | 0.0% 95% CI [-0.4, +0.3] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 17.51 | 17.51 | +0.5% 95% CI [-1.6, +2.6] | ≈ within noise |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
149739.1 | -0.9% 95% CI [-3.7, +1.9] | 1140.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
105856.2 | +3.2% 95% CI [-2.6, +9.1] | 3383.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
214066.0 | +1.6% 95% CI [-1.5, +4.8] | 8484.1 | -0.5% 95% CI [-2.6, +1.7] | ≈ within noise |
M4 Dispatch_Switch_Cold |
101809.0 | +2.5% 95% CI [-5.0, +10.0] | 1767.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
105797.8 | -0.6% 95% CI [-3.9, +2.7] | 1766.0 | +0.7% 95% CI [-0.4, +1.9] | ≈ within noise |
M6 Dispatch_ExternalType |
91348.4 | +1.5% 95% CI [-2.2, +5.1] | 987.6 | +1.4% 95% CI [-0.7, +3.4] | ≈ within noise |
M7 Update_NoChange |
55084.0 | -0.1% 95% CI [-0.6, +0.5] | 452.1 | -4.5% 95% CI [-9.7, +0.7] | ≈ within noise |
M8 Update_OneLeafChanged |
41746.5 | +0.2% 95% CI [-0.7, +1.1] | 536.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2820576.6 | +0.6% 95% CI [-0.4, +1.7] | 184278.1 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
82651.0 | +2.6% 95% CI [-5.9, +11.2] | 3095.2 | 0.0% 95% CI [-0.1, 0.0] | ≈ within noise |
M11 ModifierEHS_Frequency |
45894.4 | +0.3% 95% CI [-1.1, +1.7] | 638.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M12 Pool_Rent_HotPath |
117007.6 | +0.3% 95% CI [-0.7, +1.2] | 1099.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
107.5 | -0.5% 95% CI [-10.5, +9.6] | 26.7 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M14 Dsl_Rebuild_Cascade |
1505326.0 | +0.9% 95% CI [-0.2, +1.9] | 2231828.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
C207 ChangeHandler_DpRead_Coalesce |
1217.8 | -3.7% 95% CI [-6.8, -0.6] | 0.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
220.0 | -0.8% 95% CI [-8.0, +6.3] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
12050.3 | -0.5% 95% CI [-2.4, +1.4] | 2772.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 3.39 | 4.80 | 2.77 |
| Avg Reconcile (ms) ↓ | n/a | 19.1 | 129.1 |
| Avg Diff (ms) ↓ | n/a | 17.1 | 118.2 |
| Avg Memory (MB) ↓ | 263.5 | 197.3 | 288.3 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1043006548.
Generated by .github/workflows/perf-compare.yml · PR 25432e0 vs main e8572a0 · 2026-06-27T07:29:09Z · run log.
|
/perf |
|
/perf |
|
Closing as not-currently-measurable: the DataGrid control path is never entered by any /perf workload (StocksGrid uses a native Grid), and re-validation on current main read textbook-flat. This is an instrument gap, not a bad change — tracked as follow-up #732 (add a DataGrid stress workload, then re-measure and revive). The implementation remains on this branch for revival. |
|
/perf |
1 similar comment
|
/perf |
Closes #663
Problem
The DataGrid (
src/Reactor/Controls/DataGrid/) — which is the grid measured by theStressPerf.ReactorGriddata-grid stress workload — rebuilt arrays and re-ran LINQ on every render, even when the underlying sort/filter/column/width state was unchanged. These per-render allocations were first-order contributors to the C# vs Rust gap (Renders/sec, Avg Reconcile, Avg Diff, Avg Memory).Changes
All edits are confined to
src/Reactor/Controls/DataGrid/. New perf-only state members areinternal(covered byInternalsVisibleTo Reactor.Tests), so the public API surface is unchanged.DataGridState.csSortVersion/FilterVersion/ColumnVersion) bumped by every mutation path —ToggleSort,SetFilter/ClearFilter/ClearAllFilters,ResizeColumn/HideColumn/ShowColumn/ToggleColumnVisibility/ReorderColumn/PinColumn— used as cache keys. No-op mutations don't bump (preserving the existingStateChangedsemantics).GetSortDirection(_sortDirByField),GetFilter(_filterByField),GetColumnWidth/PinColumn(_columnIndexByName) — replacing per-column LINQFirstOrDefault/FindIndex(fix(devtools): accept HTTP.SYS-owned ports in PortOwnership check #128).Columnsgetter serves a cached_visibleColumnslist (version-keyed) instead ofWhere + ToListon every access (ci: bump fast-xml-parser, @react-native-community/cli and react-native-windows in /tests/stress_perf_rn/StocksGrid #127).GetColumnLayout(...)caches the per-column widths + a single sharedGridDefinitionkeyed onColumnVersion+ shape. The header row and every data row reuse the one definition reference, so the reconciler skips re-applyingColumnDefinitions(Spec 035 — Demo Script Tool sample app design #125).SelectRangeByKeyCachescans the internal row-keystring[]by index instead of materializing the whole cache into aList<RowKey>on shift-click (boxed one struct per row — 100k+ on large client-fallback loads) (UseAnnounce.Announce should marshal to the UI thread automatically #130).DataGridComponent.csDataRequest(Spec 034 — Element allocation reduction (EX1+EX3+EX4) + investigation #124) and the sort key (ci: bump fast-xml-parser, @react-native-community/cli and react-native-windows in /tests/stress_perf_rn/VirtualList #123) withUseMemokeyed on the version counters. TheDataRequestmemo also stabilizesUseDataSource's deps, fixing a latent spurious per-render pagination restart (the oldSorts.ToList()/Filters.ToList()produced new list references every render).GridElements directly from cachedGridDefinitions — a static root-row-def cache (docs(devtools-skill): clarify lockfile auth and add scripted-loop guidance #129) and the shared column layout (Spec 035 — Demo Script Tool sample app design #125) — instead of allocatingstring[]+double[]+ per-columndouble.ToStringeach render.LostFocussetter array in aUseRef(#131A).Intentionally not done
Element?[]): the array becomesGridElement.Childrenand is retained by the reconciler/VirtualList across diffs — returning it to anArrayPoolwould corrupt a live array. Documented in perf: cache per-render arrays/LINQ in DataGrid #663._editingVersion; zero benchmark impact (the workload does no editing). Documented in perf: cache per-render arrays/LINQ in DataGrid #663.Behavior preservation
Sorting, filtering, pagination, column show/hide/pin/resize, and range selection are unchanged. Every mutation path bumps the matching version so caches invalidate correctly.
Validation
dotnet build src/Reactor/Reactor.csproj -c Release— 0 warnings, 0 errors (no new AOT/trim warnings).dotnet test tests/Reactor.Tests— 9681 passed, 0 failed, including 11 newDataGridPerfCacheTests(cache invalidation on sort/filter/column/width change; O(1) lookups match the old LINQ; shared layout cache reference stability; visible-columns cache;SelectRangeByKeyCacheparity with the explicit-visibleOrderpath).--self-test --filter DataGrid, real WinUI window) — 0 failures (paging, scroll, edit lifecycle, edit mutation, search+sort).